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Abstract 

Stereotypical reasoning assumes that the situation at hand is one of a 
kind and that it enjoys the properties generally associated with that kind 
of situation. It is one of the most basic forms of nonmonotonic reasoning. 
A formal model for stereotypical reasoning is proposed and the logical 
properties of this form of reasoning are studied. Stereotypical reasoning 
is shown to be cumulative under weak assumptions. Keywords: Proto- 
typical Reasoning, Stereotypical Reasoning, Nonmonotonic Consequence 
Relations. 



1 Introduction 

Common sense reasoning in AI requires drawing inferences in a bolder, more 
adventurous way, than mathematical reasoning. Many different formalisms that 
implement some form of bold reasoning have been proposed, implemented, used 
to build artificial systems. Almost no work has been done comparing those 
formalisms with the way natural intelligence deals with those tasks. Minsky J5], 
^| represents probably one of the only efforts to model reasoning performed by 
natural intelligence. 

During the last decades, philosophers, linguists, sociologists have revolu- 
tionized the way we understand the human mind. Putnam |?j has criticized the 
classical philosophical assumptions about meaning, and claimed that stereo- 
types are a necessary component of the meaning of terms. Rosch [|[ [| |l(| has 
put in evidence the essential function of categorization in achieving intelligence 
and the intricate ways in which we use it. Categorization is the process in which 
we relate a specific object or situation to the kind we shall think it a member 
of. She showed that many of our categories have prototypes, i.e., best examples. 
Lakoff Q resumes and expands much of this line of research. 

The purpose of this work is to begin the study of inferencing in a mind that 
uses categories as described above. A model of the simplest kind of inferencing 
using stereotypes will be given and the formal properties of the inferencing 
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process will be studied. The formal properties of inferencing that are of interest 
have been singled out by pi |2|, || . 

2 Stereotypical Reasoning 

In this work, stereotypical reasoning is used to denote what is probably the 
simplest form of natural nonmonotonic reasoning. The present use of the word 
stereotype is very closely related to Putnam's stereotypes. He claimed that 
stereotypes are a necessary part of the meaning of words denoting a natural 
kind. Here, stereotypes are assumed not only for natural kinds but for any state 
of information. This could be understood as the assumption that stereotypes 
are part of the meaning of any sentence, but the philosophical aspects of this 
assumption are not discussed in this paper. The point of this paper is the study 
of how stereotypes are used in the inferencing process, and the formal properties 
of the inferencing resulting from the use of stereotypes. The use of stereotypes 
has not been discussed by Putnam. 

What is called here stereotypical reasoning is very closely related to the use 
of what Rosch calls prototypical categories. Prototypical alludes, though, to a 
richer structure than stereotypical and this is the reason the latter term has been 
preferred. In ordinary parlance stereotypes are considered to be typically wildly 
inaccurate and an impediment to intelligent thinking. This reputation should 
not hide the fact that the use of stereotypes is a fundamental tool, probably 
the central tool, in achieving intelligence. Hence, the importance of its study. 
Nevertheless, the negative connotation attached to the word stereotype should 
remind us we are studying a limited form of reasoning, certainly not capable of 
exhibiting all forms of intelligence. 

Here is an example of what I will call stereotypical reasoning. The choice of 
the tiger stereotype follows Putnam. If Benjamin tells you that during his trip 
in India, hiking in the jungle, he saw a tiger, you will assume he saw a large, 
frightening animal, yellow with black stripes. Note that not all tigers are such. 
Some tigers are small, dead, or albino. You have been using the stereotype that 
says that tigers are big, dangerous and yellow with black stripes. The use of 
this stereotype may be a mistake: the end of the story may reveal this was an 
albino tiger, but, typically, the use of the stereotype is precisely what enables 
efficient communication, since Benjamin knows you have this stereotype (as he 
has) he assumes you will draw the corresponding conclusions and he intends 
you to draw those conclusions. 

This simple example already suggests a number of questions, most of them 
will not be touched upon in this paper. What is the nature, or the structure 
of the stereotype tiger! Is it just a conjunction (or some other composition) 
of properties? In this work, we shall assume that yes, a stereotype is a set of 
possible states of affairs, but the proper treatment of prototypical categories in 
general may need a more sophisticated structure. Note, though, that we are 
not assuming (the classical view, attacked by Lakoff and others) that categories 
are sets (of models), far from that. We are only assuming that stereotypes are 
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sets of models. In fact, the way stereotypes are used makes them function very 
much like the graded or radial categories of Lakoff. 

How are such stereotypes acquired? This is certainly both deeply rooted in 
our physiology and a social process. 

Why is this the right stereotype for tiger? Why is it any better than some 
other? Here, an analysis of rationality and utility is certainly needed. 

What made you apply the tiger stereotype to the little story above and not, 
for example, the jungle stereotype that says that, in the jungle, everything is 
dark green, or the India stereotype, whatever this is for you? In this paper this 
choice will be modeled by some distance between the information at hand and 
the stereotype. We shall not be able to explain why a specific distance is used. 
Such an explanation would certainly be based on utility considerations. 

3 A formal model of stereotypical reasoning 

Informally, starting from information about the situation at hand, one chooses 
the best stereotype to fit the information and uses both the original information 
and the stereotypical information to draw conclusions. 

Formally, we assume W is the set of all possible states of affairs (i.e., models 
or situations). We assume a collection (not necessarily finite, but it may well be 
finite) of stereotypes, Sj. Notice we use lower indexes to identify the stereotypes. 
Each stereotype is a subset of W, the set of situations in which the stereotype 
holds. For example the tiger stereotype, Sti ger could be the set of all models in 
which tigers arc frightening live animals, yellow with black stripes. 

The user has some information, i.e., facts, about the situation at hand. This 
information is modeled by a subset F of W: the set of all situations compatible 
with the information at hand. On the basis of F, the reasoner picks up one of 
the stereotypes: S F , the stereotype most appropriate to F, in a way that will be 
discussed later. Notice we use here an upper index to denote the stereotype that 
best fits some information F. The reasoner will then conclude that the actual 
state of affairs is one of the members of the intersection F' = F n S F . The 
nonmonotonicity of the reasoning stems from this jump from F to the subset 
F'. Clearly, we do expect the set F' to be non-empty, assuming F is non-empty, 
since we want to avoid jumping to contradictory conclusions. It will be the task 
of the function that defines the best stereotype to pick a stereotype that has a 
non-empty intersection with the information F at hand. In many cases the facts 
F are given by a formula a that is known to be true. In this case F is the set 
of all models that satisfy a. We shall identify the formula a and the set of its 
models and write S a for the stereotype most appropriate for the sets of models 
of a. A formula (3 is then nonmonotonically deduced from a iff it is satisfied by 
all elements of F', that is iff any model m in the set S a that satisfies a satisfies 
(i: a|~/3 iff Vra G S a , m \= a implies m \= /?. 

Syntactically, this may be described as taking for C(X), the set of nonmono- 
tonic consequences of a set X of formulas, the set Cn(X, g(X)), of all formulas 
that logically follow from the set X U g(X), where g(X) is the set of formulas 
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that hold in all models of the stereotype that best fits X. 

Our analysis of stereotypical reasoning will use the simplistic model just 
described, since it is good enough for the purpose of this paper. If one thinks 
of first-order languages and models, one may want to refine this model and 
associate a stereotype with each one of the objects of the structure: e.g., if 
our story refers to two tigers about which one has different information, one 
will perhaps use different stereotypes for each of the tigers: a mother tiger 
stereotype and a pup tiger stereotype for example. 

Before we analyze some consequences of this model, let us point out some of 
its basic limitations. It is assumed that the conclusions from facts F are drawn 
by identifying a unique stereotype most appropriate for F. One may ask whether 
this should be the case. Instead of picking up a single stereotype, perhaps one 
should consider the set of all most appropriate stereotypes and use them all, 
i.e. their intersection. Indeed the results of next section would hold also in 
this more general model, but the uniqueness assumption will be needed later. 
Intuitively it seems to me that we do pick up a unique stereotype, sometimes 
made up of different stereotypes, but that this composition is almost never the 
simple juxtaposition, i.e., conjunction of stereotypes. The main reason probably 
is that such conjunctions are very often empty and we certainly want to avoid 
drawing inconsistent conclusions from consistent facts. Consider, for example, 
our tiger above. We did not use both the tiger and the jungle stereotypes 
because they clash about the color of the tiger: yellow and black vs. dark 
green. It may be the case that a very smart reasoner will use a tiger in the 
jungle stereotype, that implies the tiger is barely visible, but this stereotype, 
though including, somehow, both the tiger and the jungle stereotypes, cannot 
be reduced to their conjunction. To avoid premature commitment to a theory 
of the formation of compound stereotypes, this paper will just assume any set 
F of facts is associated with a unique stereotype. 

4 First consequences of the model 

As general as it is, the model presented above has some important consequences 
for the formal properties of the process of nonmonotonic deduction it defines, 
i.e., of the consequence relation f~. 

First, since what is defined by facts F is a set of models, F' , the set of non- 
monotonic consequences of F, i.e., the set of formulas that hold in all elements 
of F' is a logical theory, i.e., closed under logical consequence. In other terms, 
the relation |~ satisfies the rules of Right Weakening and And of Q . Secondly, 
since F 1 is a subset of F, any formula that is logically implied by F holds in all 
elements of F\ or, the relation |~ satisfies the Reflexivity of Lastly, since 
the information at hand is represented, semantically, by a set of models, the 
relation K satisfies Left Logical Equivalence. 
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5 Further assumptions 



The main purpose of this work is to consider whether other, more sophisticated, 
logical properties may be expected from stereotypical reasoning. It is clear that, 
in the very general model described above, without further assumptions, nothing 
more can be expected: given any relation |~ satisfying Left Logical Equivalence, 
Rcflexivity, Right Weakening and And, one may define, for any formula a, the 
stereotype S a to be the set of all models that satisfy all the formulas f3 such 
that a K/3. Since the relation |~ is reflexive, all models of S a satisfy a and, for 
any a, S a C F a . Therefore F a n S a = S a and the nonmonotonic consequence 
relation defined by the model is exactly |~. 

Our goal is to find some additional, reasonable, assumptions about the set 
of stereotypes or the way the best stereotype for a set F is chosen that will have 
interesting consequences on the nonmonotonic logic defined. In fact, the set of 
stereotypes and its structure does not seem to play an important role here and 
we shall concentrate on the choice of the best stereotype. 

Notice that the mapping from a set F to its best stereotype S F may be very 
wild. We do not expect, for example, that F' C F should imply S F C S F . It 
may well be the case that robins is the best stereotype for birds, but the best 
stereotype for antarctic birds is, for lack perhaps of knowledge of a better one, 
vertebrates. 

It is extremely helpful to consider the process of associating to the set F the 
stereotype S F as based on some notion of distance between information sets 
and stereotypes: the best stereotype for F is the stereotype closest to F: 

d(F, S F ) < d{F, S), for every stereotype S. (1) 

Notice that this notion of distance is a bit unusual, since it is defined only 
from information sets to stereotypes. We shall never use the notion of the 
distance from a stereotype to an information set. The assumption that our 
choice is based on some notion of a distance does not limit the generality of 
our model, since one may always find a suitable distance to fit any choice of 
best stereotype. The interest of this assumption is that it suggests some natural 
additional assumptions on the properties of this distance. Those assumptions 
will be related to logical properties of the nonmonotonic deduction. 

Let us suppose D is a partially ordered set (of distances) and that there is a 
function d that associates an element of D d(F, S) with every set of models F 
(in fact every set of models that could appear as an information set would be 
enough) and every stereotype S. A first assumption, already described above, 
is that this distance always enables us to pick a unique best stereotype. 

• Assumption zero: for any given information set F there exists a unique 
stereotype S F such that d(F, S F ) < d(F, S) for any stereotype S. 

A number of examples of models and choice functions will be described. They 
may not be very intuitively appealing, but their purpose is to help the reader 
understand our definitions and prove the consistency of the assumptions that 
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shall be made below. In all examples the set D of distances is taken to be the 
set of integers, eventually with oo added. 

Example 1 There is one stereotype only: Sq and Sq — W . The exact definition 
of the distance is irrelevant. Assumption zero is satisfied trivially and, for any 
F, S F = S = W. Clearly, for any F, F' = F n S F = F, and therefore F' is 
non-empty if F is. The non-monotonic logic defined happens to be monotonic 
and to be the classical one: a iff 'ah (3. 

Example 2 Assume the set W is finite. Every set S CW is a stereotype and 

d{F,S) = \S-F \ - \Sf)F\, 

where | A | indicates the cardinality of the set A. Since d(F, F) = — \ F \< d(F, S) 
for any S , we see that, for any F, S F — F, and therefore assumption zero is 
satisfied, F' = F and the logic defined is the classical one as in Example 

Example 3 Assume the set W is the set of natural numbers. Stereotypes 
are singletons of W . Distances are defined in the following way: if n G F, 
d(F, {n}) — n and if n $ F, d(F, {n}) — oo. Clearly S F is the singleton that 
contains the minimal element of F, min(.F) and assumption zero is satisfied. 
Note also that F' = mm.(F) is non-empty if F is non-empty. The model boils 
down to considering that world m is more probable than world n iff m < n. The 
logic defined results in, given a set of possibilities F, jumping to the conclusion 
that the most probable one must obtain. This provides a highly nonmonotonic 
consequence relation. 

The next example presents a simple, but natural, family of models. 

Example 4 Assume W is finite and the set of (non-empty) stereotypes Si, 
i = 0, . . . , k — 1 provides a partition of W, i.e., [J iek Si — W and Si n Sj = 0, 
for any i ^ j . Given a set F, we associate with it the stereotype Sj which 
covers F best, i.e., for which the size of the set S — F is minimal. In case this 
criterion does not define a unique stereotype, choose the stereotype with smallest 
index. Formally we may define the distance by: d(F,Si) =\ Si — F \ +r- The 
consequence relation defined is nonmonotonic. 

After these examples, let us consider interesting properties of the distance d. 
Since F and S are both sets of models (subsets of W) we may, without loss 
of generality, assume that d(F, S) = e(F D S,S — F,F — S). Three additional 
assumptions concerning the way the function e depends on each of its three 
arguments are now natural. 

• Assumption one: the function e is anti-monotone in its first argument. 
I mean that if A C A', then e(A' , B, C) < e(A, B, C). The relation C is 
the subset relation. This assumption is very natural: d(F, S) measures 
the closeness of F and S: the more they have in common, the closer they 
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are. In most cases we expect that the best stereotype for F should be 
consistent with F, i.e., have an non-empty intersection with F. If this is 
the case, our assumption is only slightly stronger: all other things being 
equal, the best stereotype for F has the largest intersection with F. The 
set F n S represents the nonmonotonic consequences of F; we prefer 
weaker consequences, therefore we prefer to take the set F Ci S F as large 
as possible. 

• Assumption two: the function e depends monotonically on its second ar- 
gument. Here I mean that if B C B', then e(A, B, C) < e(A, B\C). The 
second argument, B = S — F measures the set of models compatible with 
the stereotype but excluded by the information. Notice that the stereo- 
type may be vague, i.e., contain a large number of elements: for example 
the bird stereotype may include birds of many colors, and the informa- 
tion at hand may exclude a lot of those elements: for example we may 
know the bird we are discussing is yellow. The more such elements are 
excluded by the information at hand, the less suitable is the stereotype: 
if too many such elements are excluded a more specific stereotype may be 
more suitable. In our example, a yellow bird stereotype, if there is one 
such stereotype, should be preferred. 

• Assumption three: the function e does not depend on its third argument. 
It seems easy to justify that the function e should depend monotonically 
on its third argument, by an argument very similar to that used for justi- 
fying assumption two. It is perhaps a little less obviously natural that e 
should not depend at all on its third argument. But, notice that the set 
F — S is a measure of the strength of our nonmonotonic inference: the 
larger it is the more nonmonotonic consequences we get in addition to the 
monotonic ones. The argument just above is to the effect we should not 
get too many such inferences, but we are certainly interested in getting 
such nonmonotonic consequences, and should not try to minimize them. 
Our assumption is that how much nonmonotonicity we get should not be 
a criterion in choosing the best stereotype. 

Assumptions one to three may be summarized by the following: for any F, 
F' and any stereotypes S, S' , if 

F 1 n S' C F n S and S - F C S' - F' then d(F, S) < d(F' , 5"). (2) 

Proof: 

d(F, S) = e(FnS,S - F,F - S) < e(F' n S' , S' - F', F' - S') = d(F', S') 
I 

One may notice that Equation ^ implies that d(F,S) = d(F PI S, S). Let us 
consider the examples above again. In Example |l|, we may define the dis- 
tance d to be constant, for example d(F, S) = 0. Equation is obviously sat- 
isfied. In Example @ also, Equation @ is obviously satisfied. In Example pi 
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let us check that Equation g is satisfied. Since stereotypes are singletons, 
F'nS' CFHS implies that either F' n S' = 0, or S' = S = F' n S' = F n 5. 
In the first case d(F',S') — oo and the result holds. In the second case, if 
S = {n}, d{F, S)=n = d(F', S'). For Example |, if F' n S' C F n 5 then, 
either F' n S" = 0, or S' = S. U S — F C. S' — F', then either S - F = or 
5' = 5. US' = S = Si, 

d(F,S) = \S-F\ + l - <\S' -F'\ + l - = d(F', S'). 

k k 

If S' = Sj^S = S h F' n S' = and S - F = 0, 

d(F,5) = I < 1 <|5'|<|5'| +|. 

Equation || is satisfied. 

In the sequel we shall assume, sometimes without recalling this explicitly, 
that the distance d satisfies Equation |[ 

Our main result is that stereotypical reasoning yielded by a distance that 
satisfies the four assumptions above: i.e., uniqueness of the closest stereotype, 
antimonotonicity of the distance d(F, S) in F n S, monotonicity in S — F and 
independence from F — S, is cumulative [0. The main result is therefore the 
following. 

Theorem 1 // F n S F C F' C F, tten 5' F ' = 5' F . 

Proof: Assume F n S* F C F' C F. We must show that, for any stereotype S", 

we have d(F', S F ) < d(F', S). First, since F n S F C F', we have both FnS f CF'nS f 

and 5 F - F' C 5 F - F, therefore, by Equation |, we have d(F' , S F ) < d(F, S F ). 

By Equation]^, for any stereotype S, d(F, S F ) < d(F, S) and therefore, for any 

S, d(F', S F ) < d(F, S). Using, now, F' C F, we see that F' n 5 C F n S and 

S — F C S — F'. By Equation|, then, d(F, S 1 ) < d(F', S 1 ), and d(F', S^ 1 ) < d(F', S), 

for any stereotype S. | 

Corollary 1 The nonmonotonic consequence relation K 1 defined by stereotyp- 
ical reasoning yielded by a distance that satisfies Equation [| satisfies Cut and 
Cautious Monotonicity and is therefore cumulative. 

Proof: Suppose a Let F be the set of models of a and F' be the set 

of models of a A j3. The assumption a means that all elements of F n S F 
satisfy /3, i.e., F n S* F C F'. But clearly F' C F. By Theorem @, S' F ' = S F and 
F n 5 F = F' n 5 F ' and a |~7 iff a A /? H- I 

Karl Schlechta [0 has found a cumulative consequence relation f~ that cannot 
be defined by any stereotypical reasoning system yielded by a distance that 
satisfies Equation 0. The exact characterization of those cumulative relations 
that can be defined by stereotypical systems that satisfy Equation ^ is open. In 
the next section, we shall discuss another basic logical property of nonmonotonic 
system described in 0, Or, i.e., preferentiality. 
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6 Preferentiality 



Suppose each one of two information sets, F and F' enable us to conclude that 
some formula a holds: all elements of F n S F and all elements of F' n S F 
satisfy a. Does this imply that the union F U F' enables us to conclude a, i.e., 
do all elements of (F U F') (~l S FuF satisfy al The discussion of explains 
why this seems to be a natural property to expect. For example, assuming we 
would conclude that a bird that lives in the country flies, and that we would 
also conclude that a bird that lives in a city flies. Must we conclude that birds 
that live either in the country or in a city fly? Stereotypical reasoning does 
not always satisfy this property for the following reason. I guess that natural 
common-sense reasoning does not either, for the same reason. Suppose (3 and 7 
describe very different situations, whose best stereotypes are different. It may 
happen, nevertheless that the same property a will be shared both by models 
of (3 fl and of 7 (~l S" 7 . But the best stereotype for (3 V 7 may be very general 
and some models of, say, (3 fl S*^ 7 may not satisfy a. Intuitively, if the reasons 
for concluding a from f3 are very different from those for concluding a from 7, 
there is little hope we shall be able to conclude a from the disjunction j3 V 7. 

There is one interesting case, though, the case S F — S F , in which the desired 
conclusion follows if we strengthen one of the assumptions above. Since the 
function e does not depend on its third argument, by Assumption three, we 
shall write it as a function of two arguments. Let us assume: 

• Assumption four. e(A U A', B) = min{e(A, B), e(A', B)}. 

Clearly, assumption one already implies e(A U A', B) < min{e(A, B), e(A', B)} 
and assumption four implies assumption one. 

Theorem 2 Let assumptions zero-four be satisfied. If S F — S F , then S FuF = S F . 

Proof: Notice that we do not claim that the nonmonotonic consequence re- 
lation defined is preferential. Assume S F = S F . We must show that, for any 
stereotype S, we have d(F U F', S F ) < d(F U F', S). Then, 

d(F U F', S F ) = e((F n S F ) U (F' n S F ,),S F - (F U F')) < 

e(F n S F , S F ~F) = d(F, S F ) < d(F, S). 

Similarly d(F U F' , S F ) < d(F\ S) and 

d(F U F', S F ) < mm{d(F, S),d(F', S)} = d(F U F' , S). 
The last equality stems from assumption four. | 

Consider our examples above. In Example [l| the function d is constant and 
therefore satisfies condition four. The consequence relation, being classical, is 
in fact preferential. In Example |^, the function d proposed does not satisfy 
assumption four, nevertheless the relation defined is preferential. In Example [| 
the function d satisfies assumption four. The consequence relation defined, 
being classical, is in fact preferential. In Example ^, assumption four holds, and 
therefore the conclusions of Theorem ^, but the consequence relation defined is 
not preferential. 
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7 Conclusion 



A formal description of stereotypical reasoning has been provided. Under rea- 
sonable assumptions about the way stereotypes are attached to information sets, 
this model yields a cumulative system. The assumptions one-three concerning 
the distance between information sets and stereotypes may perhaps be tested 
experimentally. Preferentiality has been discussed, and found unplausiblc in 
general, but a more limited natural property has been put in evidence. Again, 
preferentiality should be tested experimentally. The conditions proposed above 
that imply good logical behavior are sufficient but not necessary. Other con- 
ditions may be more natural and also sufficient. The structure of the set S 
of stereotypes, in particular, has been left completely arbitrary. A reasonable 
assumption may be that this set has a tree structure: i.e., that if S and T are 
any two stereotypes such that the intersection S (~l T is not empty, then SCT 
or T C S. 
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